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(54) Title: AUTOMATIC TRACKING CAMERA CONTROL SYSTEM 



(57) Abstract 

Methodology and circuitry for automati- 
cally effecting electronic camera movement to 
track and display the location of a moving ob- 
ject, such as a person presenting a talk to an au- 
dience. A fixed spotting camera (1 10) is used to 
capture a field of view, and a moving tracking 
camera (120) with pan/tilt/zoom/focus functions 
is driven (controller 520) to the present location 
of the moving object Information for driving 
the tracking camera is obtained with reference 
to the pixel difference between a current image 
(300) and a previous image (200) within the 
field of view. A tracking algorithm computes 
the information necessary to drive the tracking 
camera from these pixel differences as well as 
data relative to the field of view of the spotting 
camera and the present tracking camera posi- 
tion. 
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AUTOMATIC TRACKING CAMERA CONTROL SYSTEM 



Field of the Invention 

This invention relates generally to camera control systems and, more 
specifically, to a two-camera system which automatically tracks movement of an object 
5 in the image space under control of a tracking algorithm. 

Background of the Invention 



attended by as many people as would like to see and hear such presentations. It is often 
inconvenient to travel to where the talk is given and one must be free at the appointed 
10 hour. Televising talks over a video network and/or making the video tape available for 
later viewing are often viable alternatives for a corporation having many geographical- 
dispersed work locations. 



room or auditorium can transform such a location into a simple and cost effective 
15 television studio. If the equipment is not overly elaborate nor difficult to operate, then 
only one person can do the work usually assigned to two or more trained personnel. As a 
result, it is economical and very convenient to record and telecast presentations made in 
that room so they can be seen and heard at other locations, and even at different times if 
desired. 

20 One weakness of the one-operator system is that a person who walks 

around during their presentation can present a significant work load to the system 
operator who must keep up with the movement of the person. This extra work load 
becomes a distraction from the system operator's principal task of presenting the most 
appropriate image to the remote audience or to the recording medium. 

25 The prior art is devoid of teachings and suggestions for a video system 

wherein a camera arrangement can track a presenter who paces and/or gesticulates, and 
thereby provides to the system operator another image which may be appropriately 
selected for immediate display or recording for later replay. 

Summary of the Invention 
30 Instead of requiring the operator to follow the presenter by physically 

controlling the movement of the camera so as to make available an image at the system 
control console which may then be displayed to the audience or captured on a recording 
medium, the technique in accordance with the present invention effects camera 
movement automatically and the operator merely selects the display image as one of 



Audio/visual presentations given in a corporate setting are seldom 



Installation of commercial-grade television equipment in a large meeting 
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many that may be chosen for viewing. 

Broadly, in accordance with the methodology of the present invention, a 
moving object is tracked within the field of view of an electronic camera system, which 
is illustratively composed of a fixed spotting camera and a movable tracking camera. In 
5 the methodology, images are sequentially generated by the electronic camera system and 
then stored as pixel representations during each of the scan intervals of the camera 
system. A sequence of pixel differences between pairs of images is then evaluated to 
produce pixel information indicative of movement of the object. From this pixel 
information, a set of camera frames is then computed wherein the camera frames are 
10 positioned to capture localized movement of the object within the field of view of the 
electronic camera system. Finally, the each camera frame is imaged with the electronic 
camera system. 

In one illustrative embodiment, the method includes, initially, a step of 
generating and storing an image, called the previous image, as captured by the camera 

15 system during a scan interval. Then, another image, called the current image, is captured 
during a subsequent scan interval. The difference between the current image and the 
previous image is evaluated on a per pixel basis to obtain pixel differences. The pixel 
differences are used to locate a bounding box which outlines or borders the region within 
the image of the spotting camera indicative of the movement by the object between the 

20 current and previous images. The dimensionality of the bounding box is then utilized to 
generate a tracking frame positioned on the moving object with the appropriate aspect 
ratio of the camera system. The actual camera frame displayed by the camera system is 
obtained with reference to the tracking frame, such as by enlarging the area of the 
tracking frame. Then the current image is generally stored as a previous image, and the 

25 cycle repeats, commencing with the capturing of a new current image. 

The organization and operation of the invention will be better understood 
from a consideration of the detailed description of the illustrative embodiments thereof, 
which follow, when taken in conjunction with the accompanying drawing. 

Brief Description of the Drawing 
30 FIG. 1 depicts a typical environment in which the system in accordance 

with present invention may operate; 

FIGS. 2-4 show, respectively, a previous image of information, a current 

image of information wherein there is movement by the person relative to the previous 

image, and a depiction of the difference between the previous and current images; 
35 FIG. 5 depicts an illustrative system architecture in accordance with the 

present invention; 
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FIG. 6 depicts a typical spotting image detected by the spotting camera; 
FIG. 7 depicts the typical locations of a Threshold Box, a Maximum 
Search Box, a Current Search Box, and a Block Box within the spotting image; 

FIG. 8 depicts a flow diagram of the tracking algorithm in accordance 
with the present invention; 

FIG. 9 depicts the occurrence of a Bounding Box appearing totally within 
the Current Search Box and outside any Block Box; 

FIG. 10 depicts the occurrence of adjacent Sub-Bounding Boxes, one 
within a Block Box and the other within the Current Search Box but outside any Block 
Box; 

FIG. 1 1 depicts the occurrence of spaced-apart Sub-Bounding Boxes one 
within a Block Box and the other within the Current Search Box but outside any Block 
Box; 

FIG. 12 depicts the arrangement of the Current Search Box, the Tracking 
Frame, and the Camera Frame; and 

FIGS. 13 and 14 depict the direction of movement for the edges of the 
Current Search Box relative to the Camera Frame and Maximum Search Box for the 
cases of a high Confidence Measure and low Confidence Measure, respectively, as to the 
location of the person. 

20 Detailed Description 

The basic characteristics of the operating environment where the 
automatic camera control system is used are as follows: 

(1) the object being tracked, such as a person, is the only object likely to be moving 
in the range of potential scenes, except for pre-determinekl areas which will have motion 

25 but where such motion in isolation is not of interest; 

(2) only one object is to be tracked, or multiple objects which generally are in 
motion simultaneously are to be tracked; and 

(3) failure to stay "on target" is not a serious flaw, provided it does not happen too 
often nor persist too long. 

30 These characteristics then impose some limitations on the operating 

environment, expressed as: 

(i) The background of the area of potential scenes must be highly static; for example, 

in the illustrative case, no drapes or plants constantly swaying, no windows or doorways 

that can see outside traffic. 
35 (ii) Objects which may move but are not of interest must be well separated from the 

area where the tracked object moves from the camera's point-of-view. For instance, the 



10 
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members of the audience must be well separated from the area where a person giving the 
talk stands and moves. 

(iii) Areas of potential scenes which may exhibit motion and where the tracked 
object may also move can be identified and marked with a Block Box. As an example, a 
5 person giving the talk may walk in front of a projection screen where slides are shown 
and changed. 

Placement of the Camera System in a Room 

The diagram of FIG. 1 is an overhead view of room 100, typical of an 
environment where tracking camera system 109 in accordance with the present invention 
10 is used. Person 101 - the object or target of interest - is standing in the center of stage 
platform 102, in front of projection screen 103, and is facing audience area 105. Two 
cameras compose electronic camera tracking system 109, namely spotting camera 1 10 
and tracking camera 120. Typically, camera system 109 is suspended from the ceiling of 
room 100. 

15 Spotting camera 1 10 does not move and looks at the entire front of the 

room, as shown by the dotted lines emanating from camera 1 10 forming a solid viewing 
angle 111. Camera 1 10 does not pan, tilt, zoom or change focus. 

Tracking camera 120 is remotely controllable and can be made to pan, tilt, 
zoom and focus by commands issued by a computer (not shown). These commands 

20 cause the image within the solid viewing angle 121, called the "tracking image," of the 
tracking camera to change and thereby tend to keep the moving person 101 in view. The 
focus function of the tracking camera may include an automatic focus capability. 

The image captured within angle 111, called the "spotting image," 
determines the coverage of the system and hence the range of possible views which may 

25 be seen by tracking camera 120. If person 101 leaves the spotting image the tracking 

camera will not be able to follow him or her outside of angle 111. As the system tends to 
keep the tracking image centered on the target, it is possible for part of the tracking 
image to view portions of the room which are outside of the spotting image when the 
target is near an edge of the spotting image. 

30 Finding the Moving Target 

The requirements that only person 101 moves and that everything else in 
the view of spotting camera 110 be stationary, combined with the sequential-image 
nature of television, can be effectively utilized to develop a camera control algorithm to 
"lock onto" person 101. 

35 a television moving picture is a series of static images, much like motion 
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picture film. Each television image is made up of lines of pixels where each pixel may 
be in any of a range of brightness values and color values which may be represented by 
numbers. By way of a hueristic discussion (this is an idealized description of the process 
- the actual algorithm is discussed in detail below), a non-moving television camera is 
5 considered. If one television image is captured and a second image is captured shortly 
thereafter, and then the image information of the second is subtracted from the first, all 
the pixels that represent objects which did not move will be the same and their difference 
will be zero. Those pixels differences whose brightness or color values changed between 
the first and second image will have non-zero pixel differences. These pixel differences 
10 indicate both where the person was and the present location of the (moving) person 101. 
All pixels whose brightness and color values did not change from the first to the second 
image represent objects which are not apparently moving and will have pixel differences 
of zero. 

The sequence of images of FIGS. 2-4 depicts the effect. The pictorial 
15 information illustrated by spotting image 200 of FIG. 2 represents the first or "previous" 
spotting image, and the pictorial information illustrated by image 300 of FIG. 3 
represents the second or "current" spotting image. "Difference" image 400 of FIG. 4 
depicts the absolute value of the pixel differences and, since person 101 was the only 
object that moved, the double image 405 represents where person 101 was and is. All 
20 other objects in the spotting image did not move and the difference image does not 
represent them. 

In practice, the double image 405 itself is not used directly. Instead, as 
the lines of pixels are processed and corresponding pixels are subtracted, the x and y 
coordinates of the highest, lowest, left-most, and right-most pixels which were non-zero 
25 are marked. These coordinates define the "Bounding Box" 410. The Bounding Box 410 
represents the differences of the double image 405 and is used by the tracking algorithm 
to determine how much panning, tilting, and zooming of tracking camera 120 is 
appropriate to drive it to tracking image position 420 where it acquires a close-up picture 
of person 101. 

30 Thus, based on the pixel differences of the previous spotting image 200 

and the current spotting image 300 taken by spotting camera 1 10, tracking camera 120 
captures the tracking image of person 101, as discussed in more detail below. 

The mapping between the location of the Bounding Box within the 
difference image 400 (and hence within the current spotting image 200) and the 

35 commands sent to Pan/Tilt/Zoom/Focus subsystem (presented below) of tracking 
camera 120 is based on knowing the settings which aim the tracking camera 120 to 
predetermined locations within the viewing angle 1 1 1 of spotting camera 110. Typically 
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these predetermined locations may be the four corners of the spotting image 200 of 
spotting camera 110. These settings are determined during alignment procedures 
accomplished during system installation. Interpolation allows for the pointing of 
tracking camera 120 with some accuracy at person 101. 

5 System Architecture 

System architecture 500 of an illustrative embodiment in accordance with 
the present invention is depicted in FIG. 5, and the operation of system 500 is succinctly 
described as follows: 

(1) Video images (an exemplary image is shown by depiction 531) from spotting 
10 camera 1 10 are sequentially captured by Video Analog-to-Digital (A-to-D) 

converter 511 in computer 510, and stored in one of the frame buffers 512. 

(2) Processor 513 retrieves necessary data from a frame buffer 512, analyzes such data, 

and computes the action, based on a tracking algorithm discussed in detail below, to 
be effected by tracking camera 120. 
15 (3) As an optional side-effect, processor 513 may develop an image displaying 

information related to the tracking algorithm. That image is transferred into Video 
Digital-to- Analog (D-to-A) converter 5 14 via one of the frame buffers 512 and 
displayed on a video monitor (not shown); such image display information is shown 
by depiction 552. 

20 (4) The actions required of tracking camera 120 are communicated to 

Pan/Tilt/Zoom/Focus controller 520 as commands over interface bus 5 15. 
Controller 520 may sometimes respond with position data in response to the 
commands, and this position data is returned to processor 513 via interface bus 515. 

(5) Controller 520 translates those commands into drive signals to perform the Pan/Tilt 
25 control and the Zoom/Focus control; these drive signals are transmitted to tracking 

camera 120 via leads 521 and 522, respectively. The signal on lead 521 is delivered 
to Pan/Tilt Head 121 of tracking camera 120, whereas the Zoom/Focus signal on 
lead 522 is delivered to Zoom/Focus subsystem 122 of tracking camera 120. The 
Pan/Tilt head 121 may sometimes respond with position data in response to the 
30 commands, and this position data is returned to controller 520 via leads 521. The 
Zoom/Focus subsystem 122 may sometimes respond with position data in response 
to the commands, and this position data is returned to controller 520 via leads 522. 

(6) Pan/Tilt head 121 and Zoom/Focus subsystem 122 respond to these control signals, 

driving the tracking camera to tracking image position 420. The tracking 
35 camera 120 thus acquires a tracking image of the person 101 (depiction 553 is 

exemplary) which is transmitted via lead 123 for use. For example, lead 123 may go 
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to system control console 540 where it might become a camera image selected by 
the operator for display or taping purposes. 
Discussion of Ihe Operating Environment 

Before the Tracking Algorithm is described in detail, aspects of the 
5 operating environment necessary to understanding the algorithm are discussed. 

A typical image taken by spotting camera 1 10, designated spotting 
image 600, is depicted in FIG. 6. Of particular note is that members 105 of the audience 
in region 605 are visible within the spotting image and that person 101 can walk in front 
of the projection screen 103. Both people in the audience and the projection screen may 
10 present motion which should normally not be tracked. 

Particular areas of interest are designated within spotting image 600, 
called "boxes", as illustrated in FIG. 7. Specifically, the following boxes are defined: 
Threshold Box 705, Maximum Search Box 710, and an optional Block Box 715. Each 
box is defined to the tracking algorithm after spotting camera 1 10 is installed in the room 
15 and remains constant. Defining the boxes is only required once in most installations. In 
situations where the system must deal with very different environments, such as where 
the same room has radically different setups and uses, it will be necessary to define 
several sets of boxes and select the appropriate set for each use. Some uses may require 
no Block Box 715; others may require more than one Block Box 715. 
20 Threshold Box 705 is placed so that it covers an area of the image which 

is unlikely to see motion, such as the ceiling of the room. The area within Threshold 
Box 705 should be illuminated to approximately the same extent as the area within the 
Maximum Search Box 710. 

Maximum Search Box 710 defines the maximum area within the spotting 
25 image 600 where control system 109 will attempt to discern motion. Things and people 
which are likely to move and are not to be tracked, such as members of the audience, 
should not be within the Maximum Search Box. In FIG. 7, Maximum Search Box 710 is 
sized from about knee- height of person 101 to as high as the tallest person might reach. 
The bottom of Maximum Search Box 710 is well above the heads of the audience, so that 
30 the heads and raised hands of the audience are unlikely to be within Maximum Search 
Box 7 10. 

Within the Maximum Search Box 7 10 is the Current Search Box 720. 
The Current Search Box is the area within which Pixel Differences are computed and 
changes in size as the algorithm runs, as described below. It is never larger than the 
35 Maximum Search Box 710. 

Block Box 715 defines an area at least partially within the Maximum 
Search Box 710 where there is likely to be motion which is usually, but not always, 
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ignored. For example, when person 101 changes slides by pushing a button on 
lectern 104, then changes in the images on projection screen 103 are within Maximum 
Search Box 710 and therefore would be seen as motion. It is preferable if such motion is 
not misinterpreted as being the person moving. The algorithm ignores motion within 
Block Box 715, except when other motion within the Current Search Box is adjacent to 

it, as described below. 

In this illustrative embodiment, each of these boxes is rectangular and 
there is only one of each. In general, the shape of each area could be arbitrary and there 
could be multiple instances of each of these within spotting image 600. 

Details about the roles of these boxes 705, 710, 715, and 720 will be 
discussed during the elucidation of the Tracking Algorithm. 



10 



The Tracking Algorithm 

In order to describe "the algorithm concisely, the following terms are 

defined: 

15 Scan Line Length -me number of pixels in an image scan line. 

Pixel Value - the luminance (brightness) and color values of each pixel in the spotting 
image. Typically luminance values range from zero (representing black) through the 
maximum pixel value (representing white). A typical maximum luminance pixel 
value is 255. (The illustrative embodiment does not elucidate use of color 
20 information; however, the algorithm may use use color information if it is available.) 

Pixel Difference - the absolute value of the difference between two Pixel Values at the 

same location between the Current Image and the Previous Image. 
Rectangle - a rectangular area of the spotting image, denned by the xmin, xmax, ymin, 
ymax coordinates from the video image wherein: 'x' corresponds to the number of 
25 the pixels across a scan line with xmin on the left, xmax on the right; and y 
corresponds to the number of the scan lines with ymin at the top, ymax at the 
bottom. 

Difference Count - the number of pixels within a Rectangle with Pixel Differences 
above the Current Threshold. 
30 Bounding Box - the minimum sized Rectangle containing all the pixels with Pixel 
Differences which are above the Current Threshold. 
Sub-Bounding Box - A Bounding Box separated from another Bounding Box; such as a 
Sub-Bounding Box totally within a Block Box and another Sub-Bounding Box 
totally outside the same Block Box. 
35 Maximum Search Box - the Rectangle defining the maximum area within the Spotting 
Image wherein motion is sought. 
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Current Search Box - a Rectangle, no larger than the Maximum Search Box and no 

smaller then the Camera Frame, wherein movement by the object is being searched. 
Threshold Box - a Rectangle within the Spotting Image wherein motion is most 
unlikely; used in determining Current Noise Level. 
5 Current Image - the copy of the lastest digitized spotting camera image stored by 
processor 513. 

Previous Images — copies of earlier, retained digitized spotting camera images stored by 
processor 513. 

Current Noise Level - the largest Pixel Difference in the Threshold Box between the 
10 Current Image and the most recent Previous Image. 

Threshold Bias - a quantity added to the Current Noise Level as measured within the 

Threshold Box to obtain the Current Threshold. 
Current Threshold — the sum of the Current Noise Level and the Threshold Bias. 
Tracking Frame - a Rectangle with the aspect ratio of tracking camera image to which 
15 an ideal tracking camera would theoretically be driven. 

Camera Frame — a Rectangle with the aspect ratio of tracking camera image to which 
tracking camera 120 is actually driven. Ideally, the algorithm should drive the 
tracking camera 120 so that it images all of the Camera Frame and no more. 
Confidence Measure - a numerical quantity used as an indication of how much the 
20 algorithm believes the current Bounding Box represents the current position of the 

target being followed. Typically a value between 0 and 100. 
Minimum Confidence Level - the Confidence Measure at which Previous Images used 

in the algorithm may be discarded. 
Extra Width Percentage - the fixed percentage added to the width and height of the 
25 Tracking Frame to create the Camera Frame. 

The following steps determine the initial settings of the Tracking 
Algorithm and are typically done at system installation. 

(a) Mount, align and adjust spotting camera 1 10 so spotting image 700 images at least the 
entire area where tracking camera 120 should track person 101. 
30 (b) Mount, align and adjust tracking camera 120 and its pan/tilt head 121 so that they can 
image all areas of spotting image 700 at the maximum telephoto (greatest 
magnification) zoom. The algorithm performs best when the lenses of spotting 
camera 1 10 and tracking camera 120 are as close to each other as possible. Typical 
installation places the vertical lens axes so they are coplanar and minimum possible 
35 displacement between the horizontal lens axes, 

(c) Define Threshold Box 705 to the algorithm. 
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(d) Define Maximum Search Box 710 to the algorithm. 

(e) Define optional Block Boxes 715 to the algorithm. 

(0 Define the correspondence between extreme points within spotting image 700, for 
instance, the four corners, and the pan/tilt commands sent by processor 513 to the 
5 Par^Ut/Zoorn/Focus controller 520 to point the center of tracking camera 120 

image at the corners. 

(g) Define the correspondence between the Zoom commands sent by the processor 513 to 
the PanAilt/Zoom/Focus controller 520 to zoom to the extremes of magnification 
(maximum wide angle and maximum telephoto image) and the corresponding 

10 Camera Frames. Typically this alignment is done with the Camera Frames at the 
center of the spotting image. 

(h) Define the system parameters: 

(i) Threshold Bias, typically 10% of the maximum Pixel Value. 

(ii) Minimum Confidence level, typically 30. 
15 (iii) Extra Width Percentage, typically 100. 

The tracking algorithm is comprised of a loop having the steps discussed 
in detail below with reference to flow diagram 800 in FIG. 8. For each step of the 
tracking algorithm, the intent of that step is first described and then the processes wluch 

accomplish it are discussed. 
20 In the following steps, "Digitizing a spotting image" implies: 

(a) Digitize an image from spotting camera 1 10 and store in digitized format in a 
frame buffer 512. 

(b) Copy this initial digitized image from a frame buffer 512 into the memory of 
processor 513. 

25 Processing Block 805. Initialize the Tracking Algorithm; Generate Previous 

Image. 

Accomplishes the following: 

(a) Set the Current Search Box 720 to the values for the Maximum Search box 710. 

(b) Digitize a spotting image as the Previous Image. 
30 These steps ensure that the algorithm is initialized with appropriate values 

and the processor 513 has at least one Previous Image within its memory to use in the 
remainder of the algorithm. 

Processing Block 810: Generate Current Image. 

Digitize a spotting image as the Current Image. 
35 This step is required in order to have an image within the memory of 

processor 513 to compare with the Previous Image. 



BNSDOCID <WO 9417636A1 J_> 



WO 94/17636 




PCT/US94/00866 



-11- 

Processing Block 820: Compute Current Threshold. 

The Pixel Difference is a combination of actual image differences due to 
objects moving and video noise. The video noise may be contributed several sources, 
typically including: 
5 (a) noise in the imaging detector within spotting camera 1 10, 

(b) noise in the imaging circuitry within spotting camera 110, and 

(c) noise in video Analog-to-Digital circuitry 511. 

If not accounted for, video noise could cause Pixel Differences which 
would appear to the algorithm as object motion. By calculating a Current Threshold 

10 which is higher that the contribution of video noise to the Pixel Differences the algorithm 
can avoid mistakingly identifying video noise as motion. 

As part of the setup procedure, a box in the image of spotting camera 110 
is picked as the Threshold Box 705. This box is in a portion of the spotting image 700 of 
camera 110 which is illuminated approximately the same as the Maximum Search 

15 Box 710 but which is never expected to see motion. Therefore, any Pixel Differences 
within the presumedly static region encompassed by the Threshold Box can be assumed 
to be due to video noise only. This video noise is assumed to be representative of the 
video noise which occurs within the Maximum Search Box. The Current Threshold is set 
as a function of this video noise. 

20 For each pixel within the Threshold Box 710, calculate the Pixel 

Difference between the Current Image and the Previous Image. The largest value found 
is the Current Noise Level. Set the Current Threshold as the Current Noise Level plus 
the Threshold Bias. 

Processing Block 830: Determine Pixel Differences Between Current and 

25 Previous Images. 

Creating a double image 405 which is a good indication of where a 
person 101 is now depends on that person having moved recent enough and far enough 
for the difference between a Previous Image and the Current Image to create a 
representative Bounding Box 410. A means to strengthen that difference is to keep 

30 several Previous Images, by increasing age relative to the Current Image. If a strong 
difference occurs between the newest Previous Image and the Current Image, that is an 
indication that there is a lot of recent motion and is sufficient to continue with this step. 
If that recent difference shows only slight motion, such as when the person is moving 
slowly or only moving one portion of their body, comparison between the next oldest 

35 Previous Image and the Current Image will create a double image 405 which will show 
the motion that has occurred over a longer period of time and hence may present a 
stronger difference on which to base the rest of the algorithm. 
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When the person moves in an area only within the Current Search Box, 
away from any Block Boxes 715, the Bounding Box 410 is defined by the Pixel 
Differences of his or her double image 405 within the Current Search Box 720. FIG. 9 
illustrates this effect, that is, Bounding Box 410 is totally within Current Search Box 720 

5 and totally outside any Block Box 7 15. 

But when person 101 walks in front of projection screen 103, as depicted 
in FIG. 10, the portion of the double image 405 within the Block Box 715 cannot be 
ignored. When there is a Sub-Bounding Box 1005 within Block Box 715 and a Sub- 
Bounding Box 1010 outside the same Block Box 715, and they are adjacent to each 
10 other, the two are merged creating the composite Bounding Box 410 used for tracking. 

If there is a significant distance between the two Sub-Bounding Boxes, as 
depicted by FIG. 1 1, such as when changing the slide generates Sub-Bounding Box 1005 
in the Block Box in addition to Sub-Bounding Box 1010 in the outside the same Block 
Box 715, Sub-Bounding Box 1010 becomes the Bounding Box 410 used for tracking. 
15 ' within the Current Search Box 720, calculate the Pixel Difference of the 

Current Image and the corresponding Previous Image. If the Pixel Difference is greater 
than the Current Threshold, continue with the following steps: 
a If the pixel is within Block Box 715, record the minimum xmin and ymin and the 
maximum xmax and ymax pixel coordinates and add one to the count of pixels found 
20 above the Current Threshold within that Sub-Bounding Box 1005. 

b. Otherwise, record the minimum xmin and ymin and the maximum xmax and ymax 
pixel coordinates and add one to the count of pixels found above the Current 
Threshold within that Sub-Bounding Box 1010. 

Processing Block 840: Determine Bounding Box. 
25 if Sub-Bounding Boxes are found within the Current Search Box and the 

Block Box, it is now necessary to determine if they should be combined. If the Sub- 
Bounding Box within the Current Search Box is "near" the Sub-Bounding Box within the 
Block Box, combine both Sub-Bounding Boxes to form a composite Bounding Box. 
Otherwise, ignore the Bounding Box in the Block Box. Typically, Sub-Bounding Boxes 
30 are considered "near each other" if the number of pixels separating the edges of the Sub- 
Bounding Boxes are within 1 % of the Scan Line Length. 

Processing Block 850: Determine the Tracking Frame. 

Typically, the aspect ratio of the tracking camera image is 4-wide to 3- 
high. Since a person does not match the 4:3 aspect ratio, a compromise is required when 

35 capturing his or her image. 

Note that in the operating environment being addressed, the head of 
person 101 is more appropriate to display than his or her feet A reasonable compromise 
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is to position the top of the Tracking Frame 1210 proximate to the top of the Bounding 
Box (which usually corresponds to the top of the person's head) and make the Tracking 
Frame as wide as the Bounding Box, as depicted in FIG. 12 . 

However, because the algorithm is tracking the difference in the person's 
5 position and not the outline of the person, the Bounding Box does not uniformly surround 
the person's image. The algorithm smooths the size and position of Tracking 
Frame 1210 to help keep the tracking camera 120 from "jumping around". Many 
algorithms may be readily devised by those with skill in the art, such as averaging, for 
smoothing the position of the Tracking Frame. The intent of smoothing the Tracking 
10 Frame is to both keep it positioned on person 101, appropriately sized, and to respond 
quickly when the person moves. The smoothing algorithm may also choose to enlarge 
the Tracking Frame when the person moves around frequently within a relatively 
confined area so that the person is always seen in the tracking camera image without 
moving the tracking camera. Later, if the person stops moving so frequently, the 
15 algorithm may choose to shrink the Tracking Frame to create a close-up image. 
Accordingly, the algorithm is then used to: 

Compute the smoothed y position of the top of the Bounding Box. Use 
the smoothed y as the top of Tracking Frame; 

Compute the smoothed width of the Bounding Box. Use the smoothed 
20 width as the width of the Tracking Frame; and 

Compute the smoothed x position of the center of the Bounding Box. Use 
the smoothed x as the desired horizontal center of the Tracking Frame. 

As part of the smoothing, calculate a Confidence Measure of the 
difference for use later. This Confidence Measure may based on the number of Pixel 
25 Differences that are above the Current Threshold within the Bounding Box. The 

Confidence Measure may take into account the recent history of the Pixel Differences. If 
there is a sudden, large increase in the number of Pixel Differences it is, likely to be due 
to an event which is not a person moving, such as a change in room lighting. The 
Confidence Measure discounts such large changes unless they persist. 
30 Likewise, a very small number of Pixel Differences after a persistent 

recent history of relatively large numbers of Pixel Differences is likely to be less 
important and is also discounted, unless they persist for a time. 

Processing Block 860: Drive Tracking Camera to the position defined by the 
Camera Frame. 

35 Even with the smoothing described in Processing Block 850, too many 

fine adjustments to tracking camera 120 image may become a distraction to those 
watching. As depicted by FIG. 12, hysteresis is introduced by selectively adding a fixed 
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percentage (the Extra Width Percentage, typically 100%), to the width and height of the 
Tracking Frame 1210, creating Camera Frame 1205. ParVTUt head 121 and Zoom/Focus 
subsystem 122 are driven to capture Camera Frame 1205 with tracking camera 120. 

Most motions of the Tracking Frame which do not take it outside the 
5 Camera Frame do not cause the Camera Frame to move. Motions of the Tracking Frame 
which do take it outside the Camera Frame cause the Camera Frame to move and hence 
the Pan/Tilt Head 121 will be moved to point the tracking camera 120 appropriately (see 
below). 

When the Tracking Frame changes size so that the Camera Frame is no 

10 longer the fixed percentage bigger than the Tracking Frame, the Camera Frame is 

adjusted in size to make it conform and hence the zoom lens of the Zoom/Focus 

subsystem 122 will be adjusted to change the magnification of the image captured by the 

tracking camera 120 appropriately (see below). 

Once me pan, tilt and zoom settings are determined, they are sent to the 

15 Pan/Tilt/Zoom/Focus controller 520 which in turn causes the tracking camera 120 to 

respond appropriately. 

When tracking camera 120 and person 101 are both moving, the current 
automatic focus technology often has difficulty getting a correct setting. Automatic 
focusing of tracking camera 120 has been much more successful when activated only 
20 after tracking camera 120 has stopped moving. Thus, if Camera Frame 1205 has been 
moving and has now stopped and not moved for a short while (typically, a second or 
two), processor 513 sends an AutoFocus command to the Zoom/Focus subsystem 122 via 
the Pan/Tilt/Zoom/Focus controller 520. 

Processing Block 870: Select the Current Search Box. 
25 when the algorithm has a clear indication of where the person is, the 

assumption is made that this is the person of interest and there is no need to look 
anywhere else. Under this condition, the algorithm ignores other motions within the 
Maximum Search Box but distant from the person by shrinking the Current Search Box 
to closely surround the person as long as it keeps seeing the person moving. If there is no 
30 motion, the algorithm enlarges the Current Search Box looking for motion. 

If there are Pixel Differences above the Current Threshold, the edges of 
the Current Search Box are moved toward the Camera Frame in steps (typically, 5% of 
the Scan Line Length), as depicted in FIG. 13; otherwise, the edges are moved toward the 
Maximum Search Box in steps, as illustrated in FIG. 14. 
35 The size of the steps with which the Current Search Box moves toward the 

Camera Frame and toward the Maximum Search Box need not be the same and may be 
installation dependent. 
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Processing Block 880: Decide Whether or not to Retain the Previous Images. 
The algorithm relies on the image-to-image difference providing enough 
motion to give a good idea of the person's position. The Confidence Measure expresses 
the algorithm's assessment of how strongly the algorithm believes it knows the person's 
5 current position. If the person is not moving much, the difference may not be very 
representative of a person and the Confidence Measure would be low. One way to 
increase the likelihood that there will be significant motion between images and a higher 
Confidence Measure is to wait longer between images, increasing the time interval 
between the Previous Images and the Current Image. 
10 In the illustrative embodiment of the invention, the Confidence Measure is 

based on the Difference Count within the Bounding Box relative to the Difference 
Counts seen in recent cycles of the algorithm. Other embodiments of the Confidence 
Measure are also possible. ; 

If the Confidence Measure is lower than the Minimum Confidence 
15 Measure the decision is then made to retain the Previous Images. If the Previous Image 
is retained, then the process continues at Processing Block 810. If the Previous Image is 
not retained, then the process continues at Processing Block 890. 

Also, if the Previous Images are retained for a long period of time, the 
high Confidence Measure may be due to having a Previous Image which contains a 
20 person within the Current Search Box and a Current Image which does not contain a 
person within the Current Search Box. (This could be the result of a person completely 
leaving the Current Search Box within the time between the capturing of the Previous 
Images and Current Image.) For this reason, if the Previous Images have not been 
updated for a long time, typically 10 seconds, the answer is set to "No" and the process 
25 continues at Processing Block 890. 

Processing Block 890: Update Previous Images. 

As seen in Processing Block 830, there may be more than one Previous 
Image. To update the Previous Images, discard the oldest Previous Image and make the 
next oldest Image the oldest previous image, and similarly until the Current Image is the 
30 most recent Previous Image. 

It is to be understood that the above-described embodiments are simply 
illustrative of the principles in accordance with the present invention. Other 
embodiments may be readily devised by those skilled in the art which may embody the 
principles in spirit and scope. Thus, it is to be further understood that the circuit 
35 arrangements and concomitant methods described herein are not limited to the specific 
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forms shown by way of illustration, but may assume other embodiments limited only by 
the scope of the appended claims. 
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What is claimed is: 

1 LA method for tracking a moving object within the field of view of an 

2 electronic camera system, the method comprising the steps of 

3 sequentially generating and storing images captured by the electronic camera 

4 system during camera scan intervals to produce pixel representations of the images, 

5 generating a sequence of pixel differences between two of the images to produce 

6 pixel information indicative of movement of the object, 

7 determining a sequence of camera frames from the pixel information, the camera 

8 frames being positioned to capture localized movement of the object within the field of 

9 view of the electronic camera system, and 

10 sequentially capturing the camera frames with the electronic camera system. 

1 2. A method for tracking a moving object within the field of view of an 

2 electronic camera system, the method comprising the steps of 

3 (a) generating and storing as a previous image the image captured by the 

4 electronic camera system during a scan interval, 

5 (b) generating and storing as a current image the image captured by the electronic 

6 camera system during a subsequent scan interval, 

7 (c) determining pixel differences between the current image and one previous 

8 image, 

9 (d) determining a bounding box from the pixel differences, the bounding box 

10 being indicative of the movement detected between the current image and the one 

1 1 previous image, 

12 (e) generating a tracking frame with reference to the bounding box, the tracking 

13 frame being indicative of the location of the moving object within the field of view of the 

1 4 electronic camera system, 

15 (f) determining a camera frame from the tracking frame and detecting the camera 

16 frame with the electronic camera of the electronic camera system, 

17 (g) storing the current image as the last previous image, and 

18 (h) returning to step (b), 

1 3. The method as recited in claim 2 wherein the electronic camera system is 

2 composed of a fixed spotting camera and of a movable tracking camera, wherein 

3 step (a) includes the step of generating and storing as a previous image the image 

4 captured by the spotting camera during a scan interval, 

5 step (b) includes the step of generating and storing as a current image the image 

6 captured by the spotting camera during a subsequent scan interval, 

7 step (e) includes the step of generating a tracking frame with reference to the 

8 bounding box, the tracking frame being indicative of the location of the moving object 
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9 within the image of the spotting camera, and 

10 step (f) includes the step of determining a camera frame from the tracking frame 

11 and detecting the camera frame with the tracking camera. 

1 4 A method for tracking a moving object within the field of view of an 

2 electronic camera system composed of a spotting camera and a tracking camera, the 

3 method comprising the steps of 

4 (a) generating and storing as a previous image the image captured by the spotting 

5 camera during a scan interval, 

6 (b) generating and storing as a current image the image captured by the spotting 

7 camera during a subsequent scan interval, 

8 (c) determining pixel differences between the current image and one previous 

9 image, ■ 

10 (d) determining a bounding box from the pixel differences, the bounding box 

1 1 being indicative of the movement detected between the current image and the one 

12 previous image, 

13 (e) generating a tracking frame with reference to the bounding box, the tracking 

14 frame being indicative of the location of the moving object within the image of the 

15 spotting camera, 

16 (f) determining a camera frame from the tracking frame and detecting the camera 

17 frame with the tracking camera, 

18 ( g ) storing the current image as the last previous image, and 

19 (h) returning to step (b). 

1 5 The method as recited in claim 4 further comprising the step, executed after 

2 said step (f), of selecting a current search area from the image captured by the spotting 

3 camera to locate the moving object. 

1 6 The method as recited in claim 4 further comprising the step, executed after 

2 said step (b), of determining a threshold box within the field of view to produce a current 

3 threshold value representative of ambient image conditions, said pixel differences then 

4 computed with reference to the current threshold value. 

1 7 A method for tracking a moving object within the field of view of an 

2 electronic camera system composed of a spotting camera and a tracking camera, the 

3 method comprising the steps of 

4 (a) generating and storing as a previous image the image captured by the spotting 

5 camera during a scan interval, 

6 (b) generating and storing as a current image the image captured by the spotting 

7 camera during a subsequent scan interval, 

8 (c) determining pixel differences between the current image and one previous 
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9 image, 

10 (d) if the pixel differences are below a predetermined threshold, returning to step 

1 1 (b); otherwise, continuing with the next step, 

12 (e) determining a bounding box from the pixel differences, the bounding box 

13 being indicative of the movement detected between the current image and the one 

14 previous image, 

15 (f) generating a tracking frame with reference to the bounding box, the tracking 

16 frame being indicative of the location of the moving object within the image of the 

17 spotting camera, 

18 (g) determining a camera frame from the tracking frame and detecting the camera 

19 frame with the tracking camera, 

20 (h) storing the current image as the last previous image, and 

21 (i) returning to step (b). 

1 8. The method as recited in claim 7 further comprising the step, executed after 

2 said step (g), of selecting a current search area from the image captured by the spotting 

3 camera to locate the moving object. 

1 9. The method as recited in claim 7 further comprising the step, executed after 

2 said step (b), of determining a threshold box within the field of view to produce a current 

3 threshold value represenative of ambient image conditions, said pixel differences then 

4 computed with reference to the current threshold value. 

1 10. Circuitry for tracking a moving object within the field of view of an 

2 electronic camera system composed of a spotting camera and a tracking camera, the 

3 circuitry comprising 

4 means for sequentially generating and for storing, as a previous image, the image 

5 captured by the spotting camera during a scan interval and, as a current image, the image 

6 captured by the spotting camera during a subsequent scan interval, 

7 means, coupled to said means for sequentially generating and for storing, for 

8 determining the pixel differences between the current image and one previous image, 

9 means, coupled to said means for determining, for generating a bounding box 

10 from the pixel differences, the bounding box being indicative of the movement detected 

1 1 between the current image and the one previous image, 

12 means, coupled to said means for generating a bounding box, for generating a 

13 tracking frame from the bounding box, the tracking frame being indicative of the location 

14 of the moving object within the image of the spotting camera, and 

15 means, coupled to said means for generating a tracking frame, for determining a 

16 camera frame from the tracking frame, and 

17 means, coupled to said means for determining a camera frame, for capturing the 
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18 camera frame by the tracking camera. 

1 1 1 The circuitry as recited in claim 10 further comprising means, responsive 

2 to the spotting camera and coupled to said means for sequentially generating and for 

3 storing, for selecting a current search area from the image captured by the spotting 

4 camera to locate the moving object. 

1 12. An electronic camera system for tracking the movement of an object within 

2 the field of view of the camera system, the system comprising 

3 a spotting camera for capturing video images within the field of view, 

4 a tracking camera having pan, tilt, zoom, and focus functions, 

5 a tracking camera controller for controlling said functions, 

6 a video controller, coupled to said spotting camera, for receiving and for 

7 converting said video images to produce a set of digitized images, 

8 a frame buffer, coupled to said video controller, for storing said digitized images, 

10 ^ a processor, coupled to said frame buffer and said tracking camera controller, for 

1 1 processing" said digitized images to generate pixel differences from said digitized images, 

12 said pixel differences being representative of localized movement of the object within the 

13 field of view of the camera system, said pixel differences being processed by said 

14 processor to produce a sequence of camera frames and, correspondingly, processor 

15 control signals for controlling said tracking camera controller, said tracking camera 

16 controller responsive to said control signals to drive said pan, tilt, zoom, and focus 

17 functions so that said tracking camera captures the moving object. 

1 13 The system as recited in claim 12 wherein said processor includes means 

2 for storing and executing a tracking algorithm having said digitized images as inputs and 

3 said control signals as outputs. 
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